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Bayesian  Full  Rank  Marginalization  for 
Two-Way  Contingency  Tables 


1 .  Sampling  Schemes  for  Two-Way  Tables 

It  is  assumed  that  the  cell  frequencies  y  are  independent,  given 
corresponding  cell  parameters  0 ,  and  possess  Poisson  distributions  with 


respective  means  0 ,  for  i  =  1,  r  and  j  =  1,  .. 

supposed  that  a  log-linear  model  is  appropriate  and  that 


s.  It  is  furthermore 


Y.  .  =  log  0 .  .  =  p  +  X .  +  +  X^ 

ij  ij  i  3 


(1.1) 


where  p,  the  X^,  X®,  and  X^  respectively  denote  the  main  effect,  and 

the  row,  column,  and  interaction  effects.  Standard  constraints  of  the 
A  B  AB  AB 

form  X  =  X  =  X.  =  X.  =0  will  not  however  be  assumed  under  our  full  rank 

i.  j. 

Bayesian  approach,  this  aspect  will  be  considered  more  fully  in  section  2. 
For  our  full  hierarchical  prior  approach  we  assume  rs-r-s+1^6. 

For  lower  dimensions  the  uninformative  prior  approach  indicated  in 


section  7  is  more  useful. 


Under  the  above  assumptions,  the  conditional  distribution  of 


the  y 


ij 


given  that 


E 

kg 


ykg  = 


n 


(1 


is  multinomial  with  sample  size  n,  and  respective  cell  probabilities 


i>f  =  0..  / 

ij  iJ 


kg 


(i  =  1,  r;  j  =  1, 


»  s) 


(1 


The  analysis  in  the  Poisson  case  will  therefore  also  be  appropriate 
for  an  r  x  s  contingency  table  where  the  overall  total,  but  no  further 
margins,  are  fixed.  The  assumption  in  (1.1)  may  in  this  case  be 
replaced  by  the  assumption 


AB  .A  ,  AB 

Y..  =  X.  +  X.  +  X.. 
ij  i  J  1J 


(1 


AB 

for  the  multivariate  logits  which  satisfy 


AB  AB 

-  eY±J  /  E  eYkg 
ij 

No  main  effect  u  is  required  in  this  situation  since  this  would 
cancel  out  in  (1.5).  This  formulation  shows  the  relationship  between 
log-linear  Poisson  analysis  (Nelder  &  Wedderburn,  1972)  and  logit 
analysis  (Goodman,  1970). 


We  could  instead  condition  on  the  row  totals  in  order  to  obtain 


our  analysis  for  our  r  x  s  contingency  table  with  row  totals  fixed. 

For  each  i  =  1,  ....  r;  we  have  that  the  distribution  of  y^ ,  . ..,  y^g 
conditional  on 


T.  v.  =  n 
*  iR  1 


(1.6 


R  R 

is  multinomial  with  respective  cell  probabilities  ^  ^ ,  .  ..,  <f>_^g  satisfying 


.  B 


•  ‘u  ;  'i. 


(j  =  1,  . .  . ,  s) 


(1.7: 


The  assumption  in  (1.1)  may  now  be  replaced  by 


B  ,B  x  ,AB 

Y.  .  =  A  ,  +  > 

ij  J  ij 


(i.8: 


where  we  have  r  separate  sets  of  multivariate  logits  satisfying 


B  B 

Y  »  Y 

1®,  =  e  ^  /  Z  e  (i  ~  1 ,  . . . ,  r ;  j  =  1 ,  . . . ,  s)  (1.9) 

1J  K 

This  analysis  will  therefore  also  be  appropriate  when  we  have  r 
independent  multinomial  distributions  each  with  s  cells,  in  which  case 
the  main  and  row  effects  cancel  out  from  the  unconditional  Poisson 
situation.  For  example,  when  s  =  2  we  have  a  logistic  linear  model  for 
binomial  data.  However,  results  for  all  conditional  models  may  be 
obtained  by  firstly  analyzing  the  unconditional  Poisson  situation  and 
then  referring  to  appropriate  transformations  of  the  parameters. 


The  Prior  Distribution 

A  two-stage  prior  distribution  is  assumed  for  the  unconditional 
Poisson  means  0„  •  At  the  first  stage,  it  is  supposed  that  the  0^_. 
are  a  priori  independent  and  Gamma  distributed,  given  a  and  £ ,  with 
respective  parameters  and  ct,  and  densities 


a£ . .-1  . 

n(0. .  I  a,  £..)  =  Q..1^  a  ^  exp  {-  a0..}  /  T  (a?..) 
ij  1  ij  ij  v  ij  ij 


(2.1) 


(0  <  0  <  <*>;  0  <  a,^  <  °°)  ; 


(i  1 ,  • • • ,  t j  j  1,  ...,  s) 


The  conditional  prior  mean  and  variance  of  9 ,  given  a  and  »  are  now 

5..  and  5  /a  respectively.  The  prior  parameter  a  measures  the  degree 
J 

of  belief  in  the  prior  estimate  . 

AB 

Under  these  assumptions,  the  cell  probabilities  in  (1.3) 
possess  a  single  Dirichlet  distribution,  with  joint  density 


l  a>V  =  r<a) 


n  r(oeJ?) 

ij  J 


_AB 

n  (♦£)  ^ 

ij 


(2.2) 


=  1;  a  >  0,  ZC?? 

ij  J  ij  J 


denotes  the  prior  mean  of  $  . 

Therefore  our  independent  first  stage  Gamma  priors  also  imply 

a  conjugate  prior  distribution  in  the  single  multinomial  situation 

corresponding  to  an  r  x  s  contingency  table  with  no  margins  fixed. 

Similarly,  we  have,  for  i  =  1,  ...,  s,  that  the  joint  distributions 

B  B 

of  the  conditional  cell  probabilities  4>_^^ »  ....  in  (1.7)  are 

independent  Dirichlet  with  joint  densities 


“•  &  = 


nr(«c“  ) 

j  3 


R  “Si  -1 

n  (4-J)  J 

j  J 


(2.4) 


where 


a*®  =  1;  =  1  for  j=l . s) 

j  J  i 


€ij  “  '  l  *18 


(2.5) 


denotes  the  prior  mean  of  <|>  . 

so  that  our  assumptions  will  also  yield  a  conjugate  analysis  for  the 
situation  where  the  row  totals  are  fixed. 

We  now  make  a  central  assumption  concerning  the  means  5  for  the 
first-stage  priors.  This  is  more  general  than  Good  (1976)  in  the  single 


multinomial  situation,  who  takes  all  the  £  in  (2.3)  to  be  equal, 
implying  exchangeability  of  the  cell  probabilities.  We  instead  suppose  that 


7 


(i  1,  1,  ...,s) 


where  the  functional  form  of  £  (.)  is  specified  and  ^  is  a  q  x  1 

vector  of  parameters  where,  q  <  rs,  corresponding  to  a  reduced  form  of 
the  model.  An  important  special  case  is 


(i  1 i  • * • »  t j  j  1,  ...,s) 


corresponding  to  the  independence  model.  Our  prior  assumptions  then 
say  that  we  believe  that  the  row  and  column  factors  may  be  independent 
and  that  we  wish  to  express  a  degree  of  certainty  in  this  belief,  as 
represented  by  the  parameter  a.  A  large  value  for  a  says  that  we 


(2.6) 


(2.7) 


are  fairly  certain  about  independence;  as  a  decreases  towards  zero 


Under  assumption  (2.7)  there  is  an  overparametrization  which  can 

be  resolved  by  introducing  any  two  independent  constraints.  For  purposes 

A  B 

of  derivation  we  set  A^  =  A^  =  0  but  our  analysis  will  not  ultimately 

depend  upon  which  particular  constraints  are  chosen.  The  vector  £  then 

. A  A  ,  ,B  , B 

comprises  q  =  r  +  s  -  1  parameters  u,  . ...  A  and  . ..,  A  . 

Many  different  reduced  models  could  be  taken  to  replace  (2.7).  If 
r  =  s,  we  might  have 


.  =  exp  (u  +  A^  +  A*?  +  >^5  } 

ij  ij 


(2.8) 


(1  =  1  ,  .  .  .  ,  r :  i  =  1  ,  . . .  ,  s) 


8 


where  <S ,  .  is  the  Kronecker-Delta  function.  The  assumption  in  (2.8) 

ij 

implies  a  quasi-independence  model,  where  the  only  non-zero  interactions 

are  along  the  diagonal  of  the  table.  Note  however,  that  only  the  prior 

means  and  not  the  cell  parameters  0 ,  are  restricted  by  special 

assumptions.  A  much  more  general  model  can  hold  for  the  0_ ,  whatever 

is  assumed  for  the  f . . .  The  0..  possess  prior  variability  around  the 

i.l  i.l 

reduced  model . 

Another  possibility,  if  the  row  and  column  factors  are  measured 
on  ordered  scales,  is  to  take 


V$>  '  81  +  l0S  *ij  <B2 . V 

where  is  the  fitted  cell  probability  corresponding  to  an  underlying 

continuous  distribution,  e.g. ,  bivariate  normal  with  five  parameters 

(£„,  ...,  8,) •  In  this  case  our  analysis  provides  a  Drocedure  for 
2  b 

investigating  the  reasonability  on  this  parametric  assumption. 

A  parameter  of  particular  interst  is 


■  ij  =  lof?  f'ij  "  l°8  ^(g)  (2-9> 

which  could  in  general  be  called  a  parametric  residual  between  the  log 

of  the  (i,j)th  cell  parameter  0  and  the  log  of  corresponding 

to  the  reduced  form  of  the  model.  A  data  based  estimate  for  p..  would 

ij 

help  us  to  judge  the  deviation  of  the  (i,j)th  cell  mean  from  its  fitted 
value  under  the  reduced  model.  Therefore,  when  judging  the  plausibility 


9 


of  the  reduced  model  it  will  be  particularly  important  to  obtain 
posterior  estimates  and  distributions  for  the  p^. 

Under  the  particular  independence  assumption  in  (2.7),  reduces, 

via  (1.1)  to 


pij  *  +  +  XJ  +  -  (v  +  +  A®)  =  A^  (2.10) 


i.e.  this  is  precisely  the  interaction  effect  A^j.  Therefore,  as  a 

special  case  of  our  analysis  we  shall  consider  the  posterior  distributions 

of  the  interaction  effects.  Note  that  no  functional  constraints  are 
AB 

required  on  the  A  owing  to  our  Bayesian  assumption  that,  give  a  and 
the  possess  a  proper  prior  distribution  with  means  5  (£) .  In  the 

independence  case  our  reduced  model  is  an  obvious  reparameterization  of 
Rasch's  multiplicative  Poisson  model;  see  Rasch  (1960),  Leonard  (1973),  and 
Lord  and  Novick  (1968,  p.  486).  Our  analysis  will  therefore  provide  a 
procedure  for  checking  the  adequacy  of  Rasch's  model. 

We  now  turn  to  the  second  stage  of  our  prior  model,  and  consider 
the  first  state  prior  parameters  a  and  The  parameter  a  is  referred 
to  by  Fienberg  and  Holland  (1973)  as  the  flattening  constant  ,  but  we 
prefer  the  terminology  shrinkage  parameter.  This  parameter  measures  the 
degree  of  belief  in  the  null  model,  and  it  would  be  ambitious  to  solely 
specify  its  value  via  a  subjective  evaluation.  We  therefore  turn  to  a 
hierarchical  Bayesian  procedure  and  assume  a  prior  distribution  for  a. 

This  will  permit  the  data  to  provide  some  information  concerning 
reasonable  values  of  ex.  An  alternative  parameterization,  useful  in 
the  posterior  analysis,  is 


4  -  1/ (a  +  1) 


(2.11 


10 

and,  for  simplicity,  we  assume  the  ignorance  prior  where  t,  is  uniformly 
distributed  over  the  unit  interval.  This  implies  the  prior  density 

n(a)  =  1/ (a  +  l)2  (0  <  ct  <  «)  (2.12) 

for  a,  which  possesses  a  long  Cauchy  like  tail.  We  propose  here 
an  alternative  to  Good’s  log  Cauchy  density  which  depends  upon  further 
prior  parameters. 

The  prior  parameters  ^  could  easily  be  taken  to  also  possess  a 
proper  prior  distribution.  However,  for  simplicity,  we  suppose  that 
they  are  uniformly  distributed  over  q-dimensional  Euclidean  space. 

The  ideas  discussed  in  this  section  are  related  to  the  general 
model  checking  approach  of  Leonard  (1983).  Note  that  our  analysis, 
based  upon  ideas  of  estimation  and  inference,  will  provide  an  alternative 
to  standard  tests  of  significance,  e.g.,  chi-square  goodness  of  fit. 

Early  Bayesian  theoretical  ideas  on  marginalization  in  a  contingency 
table  context  are  described  in  an  unpublished  report  by  Leonard  (1972), 
practical  applications  for  an  m  x  2  table  are  discussed  by  Lewis,  Wang, 
and  Novick  (1975).  See  also  a  discussion  by  Leonard  (1974). 
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11(0*4  I  «.Jgy)  =  1J 


04J  J  (a+l)  2  2  exp  {-  (a+l)0.  . } 


r(yij  +  °V 


for 


(0  <  0,  .  <  oo) 

ij 


(3.1) 


where  =  C^Cjg). 


In  particular,  the  conditional  posterior  mean  of  0  is 


E<aii  I  “.«.*>  -  Wu  +  0-0  5±1(«) 


(3.2) 


where  j;  =  l/(l+a).  This  compromises  between  £ (£) ,  representing  the 
reduced  model,  and  y  representing  the  full  (unstructured)  model.  The 
estimation  of  c  is  critically  important  when  judging  how  to  compromise 
between  these  two  extremes. 

We  next  consider  the  first  stage  prior  parameters  a  and  With 
appropriate  integrations  with  respect  to  the  0^.  from  the  joint  distri¬ 
bution  of  the  yij  ,  0^_.  ,  a  and  8  we  find  that  their  (exact)  joint 
posterior  density  is  given  by 


-2 


n (a, 8  I  y)  «  (1+a)  exp  {  I  log  r  (y . ,  +  a£  )  -  l  log  F  (a£  )  } 

*  ^  ij  lj  lj  ij  lj 


x  exp  {-  (  l  y  .  +  a  l  E,  )  log  (1+a)  +  aE£  l°g  a)  , 
ij  2  ij  2  2 


(3.3) 


where  =  C±j(jg). 
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In  order  to  approximately  marginalize  (3.3)  with  respect  to  let 
denote  the  conditional  posterior  mode  vector  of  Q,  given  a.  This 
satisfies  the  following  equation  in 


z  (y 

ij 


lj 


+  a  ~  4>  (a  ^j)— (log  ( 1+a)  -  logajJ 


,{ij$ 


0 


(3.4) 


where  4/  (z)  =9  log  r(z)/9z, 


and,  under  the  special  independence  assumption  in  (2.6), 


(3.5) 


8V& 


(C^jj  0,  . ..,  0,  • • • >0i  ^ij’  •••» 


where  the  only  positive  elements  appear  in  the  first,  ith,  and  r+j-lth 
positions . 

Following  Leonard  (1982)  and  Tierney  and  Kadane  (1984),  we  refer 
to  the  approximation,  based  upon  a  Taylor  Series  expansion  of  (3.3)  about 


I  l>  ■  I  l>  exP  <i  -  ^.>T  Vi  -  %A>  1 


(3.6) 
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where  R^,  the  posterior  information  matrix  of  given  a,  satisfies 


-3  log  IT  (a,  £  | 

R  — - - — . 


«t> 


£  = 


»Jga 


-  a  ^  C’l'(yij  +  '  Clog  (l+o)  *  log  a  2  2 

x  3\j<i> 

»*u.T 


(3.7) 


where  ^^(z)  =  32  log  r(z)/3z2  and,  under  (2.7),  the  matrix  of  second 
derivatives  of  possesses  just  nine  non-zero  elements,  each  equal  to 

51:j  ±n  (3.5),  in  the  (l,l)th,  (l,i)th,  (i,l)th,  (i,i)th,  (l,r+j-l)th, 

(i,r+j-l)th,  (r+j-1 , 1) th,  (r+j-1, i)th,  and  (r+j-1,  r+j-1) th  positions. 

The  approximation  in  (3.6)  tells  us  that 

(a)  The  conditional  posterior  distribution  of  given  a,  is  approximately 
multivariate  normal 


el  m  ' » <-bC'>  <3-8' 


with  mean  vector  6  and  covariance  matrix  R  . 

'va  a-a 

(b)  By  integration  with  respect  to  the  marginal  posterior 


density  of  a  is,  approximately 
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n*(a 


x)  =  <2n)lsq  n  (“.  Ki  U  )  /  \%a\h 


(0  <  a  <  °°) 


(3.9) 


For  fixed  a,  the  approximate  posterior  mean  of  ^provides  a 

smoothing  estimator  of  £  which  adjusts  the  usual  maximum  likelihood 

A 

estimator  ^  of  funder  the  reduced  model,  by  compensating  for  prior 

uncertainty  about  the  reduced  (e.g.  independence),  model.  Under  the 

A  B 

independence  model  (2.7)  with  A^  =  A^  =  0  ,  we  have 


A  A  AA 

Jg  =  Oi.  * 


2»  • 


Aaa 

»  A  j  »  A2 >  * 


AR  T 

. .  ,  A  )  where 
s 


A 

U 

aa 


M  =  log  yA  +  log  y  -  log  y 

1  •  •  J  • 


A“  -  log  y±'  -  log  y^ 


and 


A  j  =  log  y  -  log  yA 


(3.10) 


The  estimators  in  (3.10)  may  be  used  as  starting  values  for 
the  solution  of  (3.4),  e.g.,  using  Newton-Raphson;  then  (3.7)  is  the 
limit  of  the  Hessian  in  the  Newton  -Raphson  itemizations.  For  each  a, 
the  solutions  for  and  may  be  used  together  with  (3.3)  to  calculate 
the  approximate  marginal  posterior  density  of  a  in  (3.9). 

Some  applications  of  these  results  are  now  described:  (a)  Transforming 
(3.9)  to  the  corresponding  posterior  density  of  4  =  l/(a+l)  is  useful. 
A  plot  of  this  density  on  the  interval  (0,1)  summarises  the  information 
contained  in  the  data  about  t,  given  our  assumptions,  and  helps  us 
to  judge  the  adequacy  of  the  reduced  model.  If  the  density  is  concen¬ 
trated  near  zero  this  suggests  that  the  reduced  model  provides  a 
reasonable  fit  to  the  data.  If  the  density  is  concentrated 
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near  one  then  the  reduced  model  is  unlikely  to  be  appropriate.  It  may 

be  useful  to  compare  the  posterior  expectation  E(  c  |  ^),  obtained  by 

appropriate  integration,  with  the  central  value  t,  =  h-  The  involved 

loss  function  arguments  of  Leonard  (1983)  suggest  that  E(c  |  y)  >  % 

may  provide  an  alternative  critical  region  to  that  implied  by  standard 

fixed  size  tests  for  the  goodness  of  fit  of  the  reduced  model. 

T 

(b)  For  any  linear  combination  J^>  j3  of  the  posterior  probability, 

T 

given  a,  that  J^>  ^  <  z,  is  approximated  by 


p*<feT?  f  «  I  *.  ■  * 


(3.11) 


where  4>  is  the  cumulative  normal  distribution  function  and  and 
satisfy  (3.4)  and  (3.7).  The  corresponding  probability,  unconditional 
upon  a,  may  be  approximated  by 


P*0jTJ&  1  S  I  X)  =  /"  P*  ()?T^  <  l  I  a.  X)  n*(a  I  y)  da 


(3.12) 


where  II*(a  |  ^)  is  described  in  (3.9).  The  one  dimensional  integration 

in  (3.12)  may  be  performed  exactly,  using  numerical  techniques.  It  is 

best  to  first  transform  to  z,  =  1/  (a  +  1)  and  then  to  integrate  over  the 

unit  interval.  Similar  integrations  may  be  performed  for  the  marginal 
T 

density  of  Jt>  For  simple  point  estimation  it  suffices  to  average  the 
estimate  with  respect  to  the  distribution  for  a  in  (3.9). 

(c)  The  unconditional  posterior  mean  of  0  may  be  obtained  by  averaging 
the  conditional  mean  in  (3.2)  with  respect  to  the  posterior  distribution 
of  a  and  ^S. 
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In  the  special  case  where  our  reduced  model  is  log-linear. 


108  ?ij  =  4ij£ 


(i  1,  •••,  r;  j  —  1 ,  . . . , s) 


(3.13) 


the  posterior  mean  of  0 ,  conditional  upon  a,  but  unconditional  upon 
J3,  is  approximated  by 

E*(°ij  I  Z>a)  =  +  (1“?)  exP  {  +  fca'Vij  } 


(3.14) 


The  unconditional  posterior  mean  of  0 


ij 


is  therefore  approximated  by 


E*(0ij  1 

=  4*  y±J  +  (1- 

(3.15) 

with 

4*  =  E*(S  | 

£)  -  E(l/(l+a) 

1 

(3.16) 

and 

h  -  E(J_ 
J  l+o 

_  exp  {  d±j£a 

+  /  <>-?*) 

(3.17) 

where  the  quantities  in  (3.16)  and  (3.17)  may  be  approximately  computed 
by  appropriate  numerical  integrations  with  respect  to  the  distribution 
in  (3.9). 


Then  (3.15)  provides  a  shrinkage  estimation  for  0  which  should 
perform  much  better  than  y  with  respect  to  squared  error  loss.  Note 
that  the  y^  have  very  bad  frequential  risk  properties  -  see,  for  example, 
Clevenson  and  Zidek  (1975)  and  Tsui  (1981 )•  We  have  suggested 


alternative  shrinkage  estimators  for  Poisson  means  to  those  recommended 
in  the  literature.  We  also  provide  alternatives  to  the  contingency  table 
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analyses  of  Leonard  (1975)  and  Laird  (1978)  where  the  computations  for 
marginal  posterior  distributions  are  slightly  more  tedious.  Our  approach 
could  be  regarded  as  similar  to  Good  (1976)  but  with  more  flexible 
assumptions  for  the  first  state  prior  means  which  permit  us  to  incorporate 
model  checking  into  our  procedure.  The  very  special  case  where  all  the 
are  equal  to  a  common  prior  parameter  would  correspond  to  Good's 
procedure  and  exchangeability  of  the  0  . 

The  practical  idea  is  to  start  off  with  a  possible  reduced 
model  as  represented  by  the  (j§)  •  Then  the  posterior  distributions 
of  the  parametric  residuals  (e.g.  interactions)  p  ,  considered  in  the 
next  section,  help  us  to  consider  whether  this  model  is  appropriate. 

The  posterior  distribution  of  £  =  1/  (ot+1)  described  above  also  permits 
an  overall  model  check.  Once  we  have  finalized  our  choice  of  model 
we  may  refer  to  the  posterior  distributions  of  all  parameters,  probabilities, 
and  conditional  probabilities  of  interest.  Approximations  are  described 
in  the  next  section;  the  latest  working  reduced  model  should  always  be 
incorporated  as  prior  means,  as  long  as  this  has  scientific  meaning 
rather  than  just  being  over  parametrized  to  fit  the  data. 


Approximations  Based  on  the  Chi-square  Statistic 

The  classical  approximate  distribution  for  the  chi-square 

goodness  of  fit  statistic  is  based  upon  the  assumption  that,  given  the 

0 ,  the  y  are  independent  and  approximately  normally  distributed  with 

respective  means  0..  and  variances  0...  However,  if  we  instead  combine 
ij  ij 

our  Poisson  sampling  assumptions  for  the  y^  ,  given  the  0^,  with  our 
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first  stage  (Gamma)  prior  assumptions  for  the  0  ,  given  a  and  we  find  that 

the  y  are  marginally  (given  a  and  independent  with  respective  means 

£..  =  £..(8)  and  variances  t  .  where 
ij  ij  ^  ij 


T  =  l-  £=  ct/  (a  +  1) 


(4.1) 


If  the  y  are  taken  to  be  marginally  approximately  normally  distributed, 
then  the  distribution  of 


,x2(|)  -  T  r  ( y ij-V2  /  e, 


(4.2) 


is  therefore  approximately  chi-square  with  rs  degrees  of  freedom.  Moreover, 
the  joint  distribution  of  the  y  ,  given  t  and  £  is  approximately 


p*(y  I  T,8)  T^rs  n  C'l  exp  {-^x  xz(8)} 

ij  J 


(4.3) 


Proceed,  for  simplicity,  under  assumption  (3.13),  that  the  £  follow 
a  log-linear  model.  Then,  as  the  prior  distribution  of  t  and  ^  are  uniform, 
it  follows  that  their  posterior  density  is,  approximately 


n*(i,|  |  «  x^r S  exp  {-rs  <^T.  ^  -*st  X2(£)} 


(4.4) 


where 


y2  (Q\  v  2  ~^ij^  ,  r  ^ij£ 

X  (B)  =  l  y  e  J  +  I  e  J 

ij  1  ij 


2  Z  Yij 
ij  J 


(4.5) 


Therefore,  the  vector  ^  maximising  (3.3)  is  approximated  by  the  vector 
maximizing  (4.4)  and  hence  approximately  satisfies  the  following  equation 


F.1’  w 


;  w  ■  i-  - 


nr^T 't  ' 


-d  ‘ 


T  -v 

i._  r  v  .  /  M.1&*  2  ^ii 

t .  4j  (  6  _yij  e 


) }  +  rs£. 


=  0 
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(4.6) 


Owing  to  the  absence  of  digamma  functions,  (4.6)  should  be  more  readily 
solvable  by  Newton-Raphson  than  (3.4).  Furthermore  in  (3.7)  may  be 
approximately  replaced  by 

«a  -  *  (  5.  ^  h  ) 


ij 


ij 


(4.7) 


Therefore  the  conditional  posterior  distribution  of  given  a,  may 
be  taken  to  be  approximately  multivariate  normal  with  mean  vector  and 
covariance  matrix  satisfy  (4.6)  and  (4.7).  Furthermore,  the  marginal 

posterior  density  for  a  in  (3.9)  may  be  approximately  replaced,  from  (4.4) 
by 


n*(a  I  y) 


<r  a 


^rs 


(1+a) 


exp  {  -rs^  \a  -  h  _a  _  X2  ($a)  } 
Jsrs-2  a+1 


(0<  «  <  °°) 


(4.8) 


Suggestions  (a) ,  (b) ,  and  (c) ,  at  the  end  of  section  3  may  all  be  completed 
in  terms  of  these  approximations.  However,  still  more  explicit,  though  less 
accurate,  approximations  are  available,  based  on  the  minimum  chi-square 
statistic 


4 =  mjn  x2(^ 


(4.9) 
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Firstly,  the  marginal  distribution,  given  x  and  the  e; ,  of  this  statistic 
is  approximately  chi-square  with  rs-q  degrees  of  freedom.  It  follows  that 
the  marginal  posterior  density  of  x,  under  our  uniform  prior,  is  approximately 


^(rs-q) 
x  ^  exp 


{  -%x 


(0  <  x  <  1) 


(4.10) 


which  may  be  taken  to  replace  (4.8).  Secondly,  a  Taylor  Series  expansion, 
up  to  the  quadratic  terms »of  the  expression  in  (4.5),  gives 


x2<£>  ;  4  +  <£  -  «*)  Vl  -  «*> 


(4.11) 


where  is  the  minimum  chi-square  estimate  of  ^  and 


4  E  d  d 


(4.12) 


Substituting  for  X  (j3)  in  (4.4)  and  completing  the  square,  we  obtain  the 
approximat ions 


&  'v  B*  -  R  1  rsd.. 

^ct  'v 

and 

-  T 


(4.13) 

(4.14) 


to  the  approximation  posterior  mean  vector  and  information  matrix  of  Q , 
given  rt.  Roughly  speaking  the  approximation  in  (4.13)  will  however  only  be 
accurate  if  the  average  frequency  in  the  table  is  large  compared  with  x 
Alternatively,  (4.6)  is  available. 


Note  that,  if  ^  is  some  other  estimator  of  £,  close  to  £*,  e.g.,  the 
maximum  likelihood  estimation  under  the  reduced  model,  then 

A  A- 1 

r  i « + «  * 


and 


4 


2  A 

X  <£> 


■  T  -1 

15  £  S 


V 

A  . 


where 


) 


and 


,T  ^ 


ij 


T 

d  .d,. 
'VLj'vLj 


(e 


v+ 


T  A 


'ij 


> 


Note  further  that,  under  the  approximations  in  (4.13)  and  (4.14)  the 


unconditional  mean  vector  and  covariance  matrix  of  £  are  respectively 
approximated  by 
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cov*(£  I  X)  =  E(t  1  I  ^j^*1  +  rV  var  (t_1  |  Jg*1  d..  dT.  R*1  (4.20) 

where,  from  (4.10), 

E(t_1  |  ^)  2i  2  y  ^(rs-q)  /  y(^X^,  ^(rs-q+2))  (4.21) 

4 

and 

var(T-1  I  %)  ^  _4 _  vt1^,  *s(rs-q-2)  /  *s(rs-q+2))  -  [e(t  1  |  ^Tj2  (4.22) 

"4 

with  y(q,v)  =  zV  ^dz  denoting  the  incomplete  Gamma  function. 

Finally,  under  (4.10)  the  posterior  mean  of  x  *  1  -  ?  =  a  /  (a+1)  is 
approximated  by 

E*(T  |  y)  =h  y(h  ,  ^(rs-q+4)  /  y(%X^,  %(rs-q+2) )  (4.23) 

and  the  posterior  variance  of  t  is  also  readily  approximated. 

Posterior  Distributions  of  Quantities  of  Interest 

Consider,  firstly,  the  posterior  density  of  the  parametric  residual 

o  in  (2.9).  When  a  and  B  are  know^ ,  a  simple  transformation  of  the  Gamma 
ij 

distribution  in  (3.1)  tells  us  that  the  posterior  distribution  of  is 

given  by 

n  (P..  |  yir  «)  =  exp  {(p±j  +  log  Cij)(yi.  +  +  (y±j  +  «q.) 

p . . 

1  o g ( 1  +  <t)  -  (u  +  1)  £..e  1J  -  log  F(v  +  a£  ) }  (5.1) 

where  f, .  .  =  r, .  .  (f )  . 
i  l  r  ! 

in  sections  3  and  4  we  suggested  various  approximations  of  the  form 


^  I  “>  r  NQa>  «d  > 


to  the  conditional  posterior  distribution  of  given  a.  Following  the 
general  approach  of  Leonard  (1982)  we  may  therefore  approximate  the  conditional 
posterior  density 


n  (P±1  I  i>  «)  =  /  n(P;Lj  |  YjLjj  |)  n(jg  |  a)  d£ 


IT*(pij  I  I*  a)  -  sup  n<p1;j  I  y±j»  n<£  I  £.«> 
£  - 

(2W-’Sq  I  I  15 


n(pij  I  yij’  £  *  exp  r  ^  ^ 


where  jg..  satisfies  the  equation 


9  log  11  (pijlyij’Cij’a)  9Cij(^ij) 


with  £ 


8ij  ‘  ~3  108  "'•’ijI’W'0  3  W 


-32  log  HC^lyy-V")  85H(il>  l 


2^2 
5  £« 


ij'-ij 


’ij  ^ij ' 


where 


a  log  n  =  a  (p^  +  log  ?±j)  +  c±J  (y^  +  <*5^) 


^  n 

+  a  log  a  -  (a  +  1)  e  J  -  a  <|»  (y^  +  a?  ) 


2  -1-22  m 

-  3  log  n  =  -  5^0+  C±J  y±j  +  a  <|/V  (y^  +  otC^) 


w 

Equation  (5.5)  should  be  solved  by  Newton-Raphson,  using  ^  as  matrix  vector 
for  ^  .  Then  (5.4)  provides  our  approximation  to  the  posterior  density  of 

0^ ,  given  a.  The  unconditional  posterior  density  of  p^  may  then  be 
computed  from 
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*(Pij 1$  =  /“  n*(pi.|^,a)  H*(a|v)da 


ij 


(5.9) 


where,  for  example,  lT*(a|^)  transforms  (4.10).  One  dimensional  numerical 
integrations  are  required. 

To  a  first  approximation,  if  limited  computer  facilities  are  available, 
the  distribution  in  (5.4)  could  be  replaced  by  the  distribution  in  (5.1) 
but  with  ^  set  equal  to  in  (4.13);  then  integrate  with  respect  to  (4.10). 

When  the  reduced  model  takes  the  independence  form  (2.7)  then  (5.9) 

provides  an  approximation  to  the  posterior  density  of  the  interaction  effect 
AB 

p^.  =  X^.  Non-zero  interactions  may  then  be  judged  by  investigating  whether 
zero  lies  in  the  limits  of  this  posterior  density.  It  is  important  to  make 
a  general  practical  inference,  e.g.,  based  upon  the  location,  spread 
skewness,  and  tail  behavior,  of  this  density,  rather  than  just  referring, 
in  classical  mode,  to  a  fixed  tail  probability.  Highest  density  regions  are  probably 
also  too  formalistic  for  this  situation. 

Approximate  posterior  probabilities  may  be  computed  for  many  other 
quantities  of  interest,  in  similar  fashion.  For  example,  when  considering 
the  cell  probabilities  0^ ,  the  conditional  density  for  0  „  in  (3.1)  replaces 
the  density  for  p  in  (5.1).  The  unconditional  density  may  be  computed  from 


T*(°ij  I^>  =  fo  n*(0ijU’a)  da 


(5.10) 


where  the  first  contribution  to  the  integral  takes  exactly  the  same  form  as 
(5.4).  The  contributions  and  defined  in  (5.5)  and  (5.6)  should 

however  now  be  computed  using  the  derivatives 
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3  log  n  =  a  log  0  +  a  log  (1+a)  -a  (y^  +  aC^)  (5.11) 


and 


-3  log  n  = 


-  a2  *<»  (y 


13  +  “  V 


3£ 


ij 


(5.12) 


instead  of  those  in  (5.7)  and  (5.8). 

AB 

Consider  next  the  cell  probability  <}>^  in  (1.3).  Conditional  upon  a 

AB 

and  the  posterior  distribution  of  <f>^  is  beta  with  parameters  y  +  a5^j(j^) 
and  Zykg  +  aZ£kg(£)  -  y  (£) ,  and  density 


n(f 


AB 

ij 


a,^,^)  =  exp  {  Oij  +  “^<£>11  log  <t>f?  } 


AB 

ij 


x  exp  (  C  Zykg  +  a££kg(£)  -y±j  -  (.€>  H  i«>g  (l-^±j)  > 

x  exp  {-log  B  +  a5i:j(£),  Iykg  +  aE£kg(£)  -  y±j  -  (j^)  H  > 

AR 

(0  <  <  1)  (5.13) 


where  B(u,v)  =  r(u) T(v) /F(u  +  v) • 

This  replaces  a  similar  density  for  p  in  (4.9),  and  the  unconditional 
density  may  now  be  computed  from 


n*($2j!}r)  =  f p  n*(^|^,a)n*(ct|^)  da 


(0  <  <  1) 


where  the  first  contribution  to  the  integrand  again  takes  the  same 

'b  q 

form  as  (5.4).  The  vector  and  matrix  should  now  be  computed  using 
the  derivatives 


3  log  n  =  alog  4>i_.  -a  +  aC^)  +  ^(Zy^  +  a  E  Ckg) 

kg 


2  2(13  2  H3 

3  log  n  =a  f  ;(y_  +  a£±_.)  -  a  if/  '  (Ey^  +  a  E  5kp) 


ol 


Furthermore,  the  conditional  posterior  distribution,  given  a  and  of  the 

g 

conditional  cell  probability  <f>„  is  beta  with  parameters  y^  +  a  an' 

Ey  +  a  ^  "  y^j  -a  (£)  •  Therefore  the  marginal  posterior 

y  AB 

density  of  in  (1.7)  may  be  approximated  in  identical  fashion  to  the 
AB 

distribution  for  <f>..,  but  with  Ey  +  a  E  E  (8)  replaced  by  Ey  +  a  E  E, 
ij  kg  kg  'v  g  ig 


Linear  Combinations  of  the  Conditional  Cell  Probabilities 


It  is  of  some  interest  to  consider  parameters  of  the  form 


e.g.,  the  average  conditional  row  probability  for  particular  columns,  since 
this  may  be  a  relevant  quality  in  related  tables. 


We  firstly  consider  a  general  problem  which  will  help  us  with  the  first 
stage  of  the  situation  at  hand.  Suppose  that  ^ »  ....  <|>  possess  independent 
beta  distributions,  with  respective  parameters  (a^,  g)  ,  ...,  (ar>  3r)  then 
we  can  find  an  approximation  to  the  distribution  of 


n  -  Z  a±  $. 


(6 


Let  t.  =  log  0.  -  log  (1-0.).  Then  the  joint  distribution  of  . .  r  . 

i  °  i  °  i  I  ris 

Ti 

n(t)  =  exp  { Z  aix1  -  Z  (ai  +  g^)  log  (1  +  e  )  }  (6. 


nB  (a  ,  g  ) 
i 


As  before,  we  maximize  (6.3)  with  respect  to  x^,  ...,  x^,  but  subject  to  the 
constraint  that 


=  n 


(6, 


This  yields  the  equation 


C1i  +  Vi 


(6. 


aj  +  B, 
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to  the  marginal  density  of  n,  where  is  given  explicitly  in  (6.6).  This 
approximation  may  be  modified  by  a  determinant  term,  yielding  the  explicit 


final  approximation 
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n*(n|a,£)  - 


B,.  - 


B(V  6k> 


P  -  E  a. a 
1 

VBi 


E 

1  “i  +  Bi 


BkJ« 


(6.8) 


Returning  to  our  specific  situation  we  may  approximate  the  conditional 
posterior  distribution  of  q  in  (6.1)  by  a  distribution  of  the  form  (6.8) 
but  with  (a ^ ,  Bj),  ...,  (ar>  8r)  respectively  replaced  by  (y^  d-a^Q), 

1  yiB  +  ‘  yi1  - 


yrj  +  *  yr j  '  5rj  («» 


It  is  straightforward  to  extend  this  to  obtain  an  approximation  to 
the  marginal  posterior  distribution  of  n  ^  ,  unconditional  on  a  and  £  — 
simply  follow  the  general  procedure  of  section  5. 


Linear  Combinations  in  Log  Space 

Consider  next  the  general  problem  where  0  ,  ...,  0  possess  independent 

1  m 

Gamma  distribution  with  respective  parameters  (n  ,nt  ),  .  ...  ( n  ,«  )  and  it 

11  mm 

is  required  to  approximate  the  distribution  of 


J  3i  Yi 


(7.1) 


where  y .  =  log  0 . . 

1  1 

The  joint  distribution  of  y.,  ....  y  is 
j  1  'n 


H(y)  =  exp  ( ry.n.  -  aEe  }  X  exp  {log  a  E  n.  -  I  log  T  (n.)  } 

*  i  i  1  i  1  i  1 


Maximizing  with  respect  to  y^,  y  ,  subject  to  the  constraint  in 

(7.1)  yields  the  equations 


n.  +  Xa. 
Yi  1 


where  the  Lagrange  multiplier  Jc  satisfies 


=  I  a  log 
i 


ni  +  Xai 


I  a^  log  (n^/a)  +  A  Z 


We  hence  approximate  the  posterior  density  of  £  by 


tt* ( c)  *  n 

i 


n.  +  C  -  I  aklog 
k 


E  a,  /n. 


(7.6 


x  exp  {-  Z  -  I  (t  -  I  log  (n^/ct)  } 

i  i  k  _ 


l  \ '  \ 

k 


x  exp  (log  a  E  n.  -  E  log  f  (n  )  } 

i 


This  approximation  is  applicable  to  our  contingency  table  situation  by 
instead  taking  the  summations  and  products  to  be  over  i  =  1,  . ..,  r  and 
j  =  1,  ...»  s  and  replacing  the  tk  ,  a,  and  a^  by  y  +  a  +  1, 

and  a^j •  Then  (7.6)  provides  an  approximation  to  the  posterior  density, 
given  a  and  (3  of 


log  0 


ij 


E 

ij 


‘ij 


'ij 


(7.7 


For  example,  a  =  0  provides  the  uninformative  prior  situation  where 
no  reduced  model  is  incorporated.  This  assumption  is  particularly  useful 
when  r  and  s  are  small.  It  is  in  this  case  meaningful,  rather  than  considering 
the  interaction  effect  via  (2.9)  to  directly  find  the  approximate  posterior 
distribution  of,  say 


rAB  - 

kg  Ykg  Yk. 


y.g  +  Y 


This  may  be  achieved  by  setting 


-1  -1  -1  -1 
V,  =  1  "  S  "  r  +  r  s 

=  (-s  1  -  r  ls  S  (all  j  #  g) 

ei  =  -  (r  1  -  r  's  S  (all  i  ^  k) 

i  c 

-1  -1 

a  ,  .  =  r  s 


(7.8; 


(all  i  t  k,  j  t  g) 


in  the  approximation  (6.6),  which  now  reduces  (as  E  a^  =  0)  to 


< 


?>  ■  5  fa + fSLiT* " 


A  2  , 

where  £  =  E  a .  .  log  y. .  and  v  =  E  a . .  /  y. . . 

.  .  ij  ii  ij  7  ij 

ij  ij 

The  approximation  in  (7.6)  may  also  be  employed  for  general  a  and  ^ 
and  when  ot  and  Q  possess  the  posterior  density  in  (3.3).  In  this  case  the 
marginal  posterior  density  of  £,  unconditional  on  a  and  may  again  be 
approximated,  using  the  general  techniques  of  section  5. 


Prediction 

Consider  the  prediction  of  a  future  frequency  z^  for  the  (i,j)th  cell 

when  the  future  crand  total  for  the  table  is  fixed  to  be  EZ,  =  m.  Then 

ky 

it  is  appropriate  to  take  z  to  possess  a  binomial  distribution  with  cell 

g 

probability  ,  defined  by  (1.3)  and  sample  size  m. 

g 

The  posterior  distribution  of  0^  ,  given  a  and  is  beta  with 

parameters  y^  +  a£^(^)  and  Eyfc?  +  otE£kg(£)  -  y^  -  a?i_.  (^) .  Hence  the 
predictive  distribution  of  z  ,  given  the  y's  and  a  and  8  is 


P<Zii  I  l »  a’  # 


(m+1)  B(zlj+yij+a?ij(jg),  -  z±j  -  y±j  -  a^Cg)) 


B  (  zi  j  "*"1  ’  B(yij+a£ijQ),  (jg)-y  -a^  j  (Jg)) 


unconditional  upon 


The  predictive  distribution  P(z^  I  X?  Zij ’ 
a  and  ^  may  be  approximated  by  again  following  the  general  techniques 
of  section  5.  It  is  possible  to  predict  z  when  its  row  total  is  fixed, 
in  very  similar  fashion. 


A  Practical  Case  Study 

The  data  in  Table  1  comprise  a  12x8  table  cross-classifying  5648 
examinees  by  school  (A,  B,  L)  and  aptitude  grade  (1,  2,  8)  on 

a  military  aptitude  test  prior  to  entering  one  of  the  twelve  schools. 

The  first  subrow  of  each  row  gives  the  observed  frequencies,  the  second 
subrow  gives  the  observed  conditional  row  percentages,  and  the  third  subrow 
gives  initial  smoothed  cell  frequencies,  discussed  in  the  fourth  paragraph 
of  this  section.  It  is  of  interest  to  compare  the  effects  of  the  selection 
procedures  (a  combination  of  school  policy  and  student  choice)  upon  the  entry 
abilities  of  the  various  schools.  The  grade  point  boundaries  are  70,  80,  ..., 

The  analysis  is  intended  to  be  preliminary  to  a  full  study  of  a  12x8x10 
table  also  classifying  according  to  a  criterion  grade  obtained  by  the 
students  upon  graduation  from  one  of  the  twelve  schools. 

The  primary  result  of  our  analyses  was  that  the  12x8  table  may  be 
collapsed  into  the  3x2  cross-classification  described  in  Table  4,  i.e. 

(a)  For  comparison  of  schools  it  is  reasonable  to  consider  the 
conditional  probability  of  obtaining  one  of  the  highest  three 
grades  (1,2,3,) 

(b)  Four  schools  (B,C,E,I)  are  above  average,  with  an  average  probability 


of  0.571  for  those  three  grades. 


(c)  Four  schools  (A,D,G,H)  are  average  with  an  average  probability  of 
0.495. 

(d)  Four  schools  (F,J,K,L)  are  below  average  with  an  average  probability 
of  0.339. 

Our  analysis  proceeded  upon  the  following  lines: 

(I)  A  Bayesian  explanatory  interaction  analysis  which  highlighted  the 
good  and  bad  schools  together  with  the  relevant  grades. 

(II)  Collapsing  the  12x8  table  to  a  12x2  table,  combining  grades 
1,2  and  3  and  grades  4, 5, 6, 7,  and  8. 

(III)  Bayesian  and  significance  testing  investigations  as  to  whether 
the  12x2  table  could  be  reduced  to  a  3x2  table. 

(IV)  Calculation  of  the  posterior  distributions  of  12  conditional 
probabilities  (of  obtaining  one  of  grades  1,2,  and  3  at  each  of 
the  schools).  These  roughly  speaking  compromise  between  the  12x2 
table  and  the  3x2  table  in  the  ration  2:3. 

For  step  (I)  the  posterior  density  of  t  =  a  /  (1  +  a)  was  calculated 

from  the  approximation  in  (4.10)  for  the  method  described  in  section  3 

and  smoothing  the  whole  12x8  table  towards  independence.  The  chi-square 
2 

value  was  X  =  456.93  on  77  degrees  of  freedom.  This  density  is  described 
as  curve  A  of  Figure  1  and  possesses  mean  0.172.  Our  analysis  therefore 
suggests  a  compromise  between  the  saturated  interaction  model  and  the 
independence  model  in  the  ratio  83:17  thus  refuting  independence  across  the 
whole  table.  The  corresponding  posterior  means  of  the  cell  frequencies  are 
described  in  the  third  subrows  of  Table  1;  simpler  results  are  however  obtained 


below. 
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Approximations  to  the  posterior  densities  of  all  interaction  effects 
were  obtained  from  (5.9)  using  (4.13)  and  (4.14)  to  approximate  and 
as  appearing  in  (5.4).  For  example,  the  eight  posterior  densities  for  school 
L  are  given  in  Figure  1,  and  are  numbered  according  to  grades  1  to  8. 

Note  that,  for  school  L,  the  interactions  for  the  highest  grades  1-3 
are  clearly  negative,  while  those  for  the  lowest  grades  6-8  are  clearly 
positive.  The  interactions  for  grades  4  and  5  seem  positive,  but  a  formal 
judgment  might  involve  the  precise  specification  of  the  size  of  a  Bayesian 
test.  As  the  locations  of  these  densities  are  close  to  zero,  we  prefer  to 
simply  make  the  practical  judgment  that  the  interactions  are  probably  positive 
but  that  the  evidence  is  inconclusive. 

Note  that,  for  school  L,  there  is  a  zero  count  for  grade  1,  but  that 
the  posterior  distribution  of  the  interaction  effect  is  still  proper. 

While  fairly  flat  this  distribution  still  gives  substantial  evidence  that  the 
interaction  effect  is  negative.  This  however  need  not  always  be  the  case 
for  zero  cell  counts  (e.g.  if  there  were  few  observations  in  the  same  row 
and  column  the  interaction  could  still  be  zero) . 

There  is  therefore  substantial  evidence  that  school  L  is  below  par 
among  the  twelve  schools.  Similar  graphics  were  obtained  for  each  of  the 
other  eleven  schools  in  turn.  The  results  are  summarized  in  Table  2;  -,  0, 
or  +  indicates  clear  negative,  zero,  or  positive  interactions.  A  box  around 
either  of  these  symbols  means  that  a  precise  judgment  cannot  be  made  without 
more  formalism  but  that  this  is  our  practical  judgment  of  the  interaction. 

The  most  striking  aspect  of  Table  2  is  the  clear  demarcation  between 
the  third  and  fourth  grades  (grade  point  boundary  =  110  average  score 
across  all  schools).  Schools  with  positive  interactions  in  the  first  three 
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grades  tend  to  have  negative  interactions  in  the  last  five  grades,  and  vice 
versa.  There  is  therefore  clear  evidence  that  when  comparing  schools 
(rather  than  assessing  students)  we  should  count  high  proportions  in  the 
first  three  grades  as  good,  high  proportions  in  the  last  five  grades  as  bad, 
and  vice  versa. 

Table  2  can  also  be  used  to  assess  the  relative  merits  of  the  different 
schools  if  various  aspects  of  the  raw  data  (e.g.,  sample  sizes  and  percentages 
for  first  three  grades)  are  taken  into  account.  Such  considerations  motivated 
us  to  partition  the  12x8  table  into  three  4x8  tables  corresponding  to  the 
good  (B,C,E,I)  schools,  the  average  (A,D,G,H)  schools,  and  the  below  average 
(F,J,K,L)  schools.  For  example,  school  C  is  preferred  to  school  D  for  group  1 
because  of  its  superior  interaction  structure  and  because  its  observed 
percentage  of  0.550  for  the  first  grade  is  based  on  a  much  larger  sample 
size  than  the  value  0.529  for  school  D.  The  interactions  of  course  become 
more  significant  due  to  the  larger  sample  sizes. 

It  was  of  interest  to  investigate  whether  these  three  subtables  exhibited 
separate  independence  of  rows  and  columns.  We  therefore  calculated  the  posterior 
densities  of  t  =  a  /  (1  +  a)  performing  our  previous  analysis  for  each 
subtable  individually.  The  posterior  density  (B2)  for  the  second  school 
is  described  in  Figure  1,  and  corresponds  to  a  posterior  mean  of  0.78. 

This  suggests  that  the  saturated  and  independence  models  should  be  weighted 
in  the  ratio  1:4  and  therefore  provides  substantial  evidence  in  favor 
of  independence  of  performance  for  the  average  schools  when  all  eight  grades 
are  taken  into  account.  However,  a  similar  result  is  untrue  for  the  good,  and 


below  average  schools  since  the  posterior  densities  B1  and  B3  in  Figure  1 
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correspond  to  posterior  means  of  0.36  and  0.27,  refuting  independence. 

Even  if  the  rows  and  columns  of  a  collapsed  table  summarising  the  important 
features  of  the  original  table  may  still  be  Independent,  i.e.,  non- independence 
for  the  original  table  may  be  due  to  local  fluctuations  between  adjacent 
cells  rather  than  due  to  an  important  global  aspect. 

Therefore,  at  Step  (II)  of  the  analysis,  we  utilized  the  kay  conclusion 
of  the  interaction  analysis  by  collapsing  the  whole  12x8  table  into  a  12x2 
table,  where  the  first  column  combines  the  first  three  grades,  and  the 
second  column  combines  the  last  five  grades.  Collapsed  table  A  is  described 
in  Table  3.  It  may  be  regarded  as  comprising  these  4x2  subtables,  corresponding 
to  the  good,  average,  and  below  average  schools. 

We  now  obtain  independence  of  rows  and  columns  for  each  of  the  three 
subtables  under  either  a  frequentist  or  Bayesian  analysis.  For  the  three 
tables,  the  values  of  chi-square  with  3  degrees  of  freedom  are  6.87,  1.08, 
and  8.21  with  respective  p-values  of  .08,  .78,  and  .04  respectively.  The 
p-value  of  .04  for  the  below  average  group  could  be  made  substantially  larger 
by  omitting  school  K  and  putting  it  in  its  own  (inferior)  group.  However, 
the  overall  value  16.16  of  chi-square  with  9  degrees  of  freedom  is  anyway  as 
large  as  0.27,  even  though  the  sample  sizes  are  very  large.  Our  conclusion 
is  supported  by  the  Bayesian  posterior  distribution  of  t  which  in  this 
case  yields  a  posterior  mean  of  0.60  and  weights  the  saturated  model  for  the 
12x2  table  and  the  model  with  independence  of  rows  and  columns  for  each  of 
the  three  4x2  subtables  In  the  ratio  2:  1 

2  2 

1. canard  (1083)  argues  that  the  value  of  \  corresponding  to  E(r  |  X  )  =  0.5 
is  m  upproprinie  critical  value,  on  9  degrees  of  freedom  this  value  is 
(0.  corresponding  to  the  78.8th  percentile.  On  1  degrees  of  freedom  it  is 
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remarkably  7.81  corresponding  to  the  95th  percentile.  Therefore  our  frequentist 
and  Bayesian  procedures  give  roughly  the  same  validation  when  considering  each 
4x2  table  individually. 

At  Step  (III)  of  the  analysis  we  may  follow  the  conclusion  of  Step  (II) 
by  finally  collapsing  the  12x2  table  into  the  3x2  collapsed  table  B  described  in 
Table  4.  This  provides  a  very  simple  summary  of  the  main  features  of  Table  1. 

At  Step  (IV)  we  obtained  the  posterior  distribution  of  the  probabilities, 
for  each  of  the  twelve  schools  separately,  of  obtaining  one  of  the  first  three 
grades.  Here  the  procedures  of  section  5  were  applied  to  the  frequencies  in  Table 
3,  but  where  Table  4  represents  the  reduced  model.  The  posterior  means  of  the 
cell  probabilities  are  described  in  Table  5.  The  full  posterior  densities  are 
available  upon  request. 

The  data  base  considered  in  this  section  has  been  analyzed  by  several  previous 
authors,  e.g.,  Sims  and  Hiatt  (1981),  Dunbar  and  Novick  (1984)  who  for  various 
reasons,  selected  the  population  to  omit  about  1000  of  the  students.  The  larger 
data  set  is  of  lower  quality;  however  if  these  extra  students  are  included,  then 
very  similar  conclusions  are  reached. 
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Table  1 

Grade 

4 


20 

179 

276 

316 

123 

27 

A 

(2.09) 

(18.74) 

(28.90) 

(33.09) 

(12.88) 

(2.83) 

19.05 

177.72 

275.06 

313.17 

122.08 

31.45 

10 

80 

112 

112 

6 

5 

B 

(3.07) 

(24.54) 

(34.36) 

(34.36) 

(1.84) 
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49 
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29 
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56 
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87 
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62 
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Table  3 

Collapsed  Table  A 
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Grades 
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K  • 
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4-8 
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B 
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C 

708 
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. 

E 

89 

(0.601) 

59 

• 

I 
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71 

r 

Is 
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(1108) 
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(837) 
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95 
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97 
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F 
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%  ■. • 
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78 
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l 
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81 
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119 
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(523) 
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i*-. 
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Table  4 

Collapsed  Table 
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i 
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(1547) 
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Table  5 

Posterior  Means  of  Success  Probabilities 
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FIGURE  1:  POSTERIOR  DENSITIES  OF  t  =  ot/(l+a) 
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